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cedures, 


Iam pleased that Ray Hyman, one of parapsychology’s most 
knowledgeable and skeptical critics, concurs with Charles Hon- 
orton and me on so many aspects of the autoganzfeld experi- 
ments: the soundness of their methodology, the clear rejection 
of the null hypothesis, and, of course, the need for further repli- 
cation. I hope this brief response will further augment our areas 
of agreement. 

Hyman raises two major points about our article. First, he 
challenges our claim that the results of the autoganzfeld studies 
are consistent with those in the earlier database. Second, he ex- 
presses concerns about the “incomplete justification of the ade- 
quacy of the randomization procedures” and speculates that 
inadequate randomization may have interacted with subject or 
experimenter response biases to produce artifactual results. 


Consistency With the Earlier Database 


The earlier ganzfeld database comprised studies whose meth- 
ods and results were quite heterogeneous. Consequently, one 
cannot justify any strong claims that some subsequent finding 
is either consistent or inconsistent with that database. For this 
reason, Honorton and I were careful not to make such claims. 
With regard to the major finding, we simply observed that ear- 
lier studies had achieved an overall hit rate of about 33% (25% 
would be expected by chance) and noted that the autoganzfeld 
experiments achieved approximately the same effect size. End 
of claim. 

In general, the earlier database served primarily to suggest the 
kinds of variables that needed to be examined more systemati- 
cally or more rigorously in the new studies. For example, previ- 
ous ganzfeld studies that had used multi-image View Master 
slide reels as target stimuli obtained significantly higher hit rates 


I am grateful to Richard Broughton of the Institute for Parapsychol- 
ogy in Durham, North Carolina, for going through the original auto- 
ganzfeld computer files with me to unearth the data necessary for the 
additional analyses presented in this response. 

Correspondence concerning this article should be addressed to Daryl 
J. Bem, Department of Psychology, Uris Hall, Cornell University, Ith- 
aca, New York 14853. Electronic mail may be sent to d.bem@cor- 
nell.edu. 
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Response to Hyman 


Daryl J. Bem 


R. Hyman (1994) raises two major points about D. J. Bem and C. Honorton’s (1994) article on the 
psi ganzfeld experiments. First, he challenges the claim that the results of the autoganzfeld experi- 
ments are consistent with the earlier database. Second, he expresses concerns about the adequacy of 
the randomization procedures. In response to the first point, I argue that our claims about the con- 
sistency of the autoganzfeld results with the earlier database are quite modest and challenge his 
counterclaim that the results are inconsistent with it. In response to his methodological point, I 
present new analyses that should allay apprehensions about the adequacy of the randomization pro- 


than did studies that had used single-image photographs. This 
finding prompted Honorton and his colleagues to include both 
video film clips and single-image photographs in the autoganz- 
feld experiments to determine whether the former were supe- 
rior. They were. Our only claim about methodological compa- 
rability was the modest observation that “by adding motion and 
sound, the video clips might be thought of as high-tech versions 
of the View Master reels.” 

But Hyman argues at length that video clips are not really 
like View Master reels. Surely this is a matter of interpretation, 
but does it really matter? Usually in psychology, successful con- 
ceptual replications inspire more confidence about the reality 
of the underlying phenomenon than do exact replications, I be- 
lieve that to be the case here. 

An example of a variable selected from the earlier database 
for more rigorous reexamination was sender-receiver pairing. 
Previous ganzfeld studies that permitted receivers to bring in 
friends to serve as senders obtained significantly higher hit rates 
than did studies that used only laboratory-assigned senders. But 
as we emphasized in our article, “there is no record of how 
many participants in the former studies actually brought in 
friends,” and hence these studies do not provide a clean test of 
the sender-receiver variable. Moreover, the two kinds of studies 
differed on many other variables as well. 

In the autoganzfeld studies, all participants were free to bring 
in friends, and it was found that sender-receiver pairs who were 
friends did, in fact, achieve higher hit rates than did sender— 
receiver pairs who were not friends (35% vs. 29%). But the reli- 
ability of this finding is equivocal. In the archival publication of 
the autoganzfeld studies, Honorton et al. (1990) presented this 
finding as a marginally significant point-biserial correlation of 
.36 (p = .06). In our article, however, we chose to apply Fisher’s 
exact test to the hit rates themselves. Because this yielded a non- 
significant p value, we thought it prudent simply to conclude 
that “sender-receiver pairing was not a significant correlate of 
psi performance in the autoganzfeld studies.” 

But to Hyman, “this failure to get significance is a noteworthy 
inconsistency.” (In part, he makes it appear more inconsistent 
than it is by erroneously stating that the earlier database yielded 
a significant difference in performance between friend pairs and 
nonfriend pairs. As noted earlier, this is an indirect inference at 
best.) 
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I submit that Hyman is using a double standard here. If the 
successful replication of the relation between target type and psi 
performance is not analogous to the earlier finding with the 


View Master reels, then why is this near miss with a method- 


ologically cleaner assessment of the sender-receiver variable a 
“noteworthy inconsistency”? 

Hyman cannot have it both ways. If the heterogeneity of the 
original database and the methodological dissimilarities be- 
tween its variables and those in the autoganzfeld studies pre- 
clude strong claims of consistency, then these same factors pre- 
clude strong claims of inconsistency. 


Randomization 


As we noted in our article, the issue of target randomization 
is critical in many psi experiments because systematic patterns 
in inadequately randomized target sequences might be detected 
by subjects during a session or might match their preexisting 
response biases. In a ganzfeld study, however, randomization is 
less problematic because only one target is selected during the 
session and most subjects serve in only one session. The primary 
concern is simply that all the stimuli within each judging set be 


sampled uniformly over the course of the study. Similar consid-. 


erations govern the second randomization, which takes place 
after the ganzfeld period and determines the sequence in which 
the target and decoys are presented to the receiver for judging. 

In the 10 basic autoganzfeld experiments, 160 film clips were 
sampled for a total of 329 sessions; accordingly, a particular clip 
would be expected to appear as the target in only about 2 ses- 
sions. This low expected frequency means that it is not possible 
to statistically assess the randomness of the actual distribution 
observed. Accordingly, Honorton et al. (1990) ran several large- 
scale control series to test the output of the random number 
generator, These control series confirmed that it was providing 
a uniform distribution of values through the full target range. 
Statistical tests that could legitimately be performed on the ac- 
tual frequencies observed confirmed that targets were, on aver- 
age, selected uniformly from among the four film clips within 
each judging set and that the four possible judging sequences 
were uniformly distributed across the sessions. 

Nevertheless, Hyman remains legitimately concerned about 
the adequacy of the randomizations and their potential interac- 
tions with possible receiver or experimenter response biases. 
‘Two kinds of response bias are involved: differential preferences 
for video clips on the basis of their content and differential pref- 
erences for clips on the basis of their position in the judging 
sequence. 


Content-Related Response Bias 


Because the adequacy of target randomization cannot be sta- 
tistically assessed owing to the low expected frequencies, the 
possibility remains open that an unequal distribution of targets 
could interact with receivers’ content preferences to produce 
artifactually high hit rates. As we reported in our article, Hon- 
orton and I encountered this problem in an autoganzfeld study 
that used a single judging set for all sessions (Study 302), a prob- 
lem we dealt with in two ways. To respond to Hyman’s concerns, 
I have now performed the same two analyses on the remainder 


of the database. Both treat the four-clip judging set as the unit 
of analysis, and neither requires the assumption that the null 
baseline is fixed at 25% or at any other particular value. 

In the first analysis, the actual target frequencies observed are 
used in conjunction with receivers’ actual judgments to derive 
a new, empirical baseline for each judging set. In particular, I 
multiplied the proportion of times each clip in a set was the 
target by the proportion of times that a receiver rated it as the 
target. This product represents the probability that a receiver 
would score a hit if there were no psi effect. The sum of these 
products across the four clips in the set thus constitutes the em- 
pirical null baseline for that set. Next, Icomputed Cohen’s mea- 
sure of effect size (A) on the difference between the overall hit 
rate observed within that set and this empirical baseline. For 
purposes of comparison, I then reconverted Cohen’s h back to 
its equivalent hit rate for a uniformly distributed judging set in 
which the null baseline would, in fact, be 25%. 

Across the 40 sets, the mean unadjusted hit rate was 31.5%, 
significantly higher than 25%, one-sample (39) = 2.44, p = .01, 
one-tailed. The new, bias-adjusted hit rate was virtually identi- 
cal (30.7%), (39) = 2.37, p = .O1, tain(39) = 0.85, p = .40, indi- 
cating that unequal target frequencies were not significantly in- 
flating the hit rate. 

The second analysis treats each film clip as its own control by 
comparing the proportion of times it was rated as the target 
when it actually was the target and the proportion of times it was 
rated as the target when it was one of the decoys. This procedure 
automatically cancels out any content-related target prefer- 
ences that receivers (or experimenters) might have. First, I cal- 
culated these two proportions for each clip and then averaged 
them across the four clips within each judging set. The results 
show that across the 40 judging sets, clips were rated as targets 
significantly more frequently when they were targets than when 
they were decoys (29% and 22%, respectively), paired (39) = 
2.03, p = .025, one-tailed. Both of these analyses indicate that 
the observed psi effect cannot be attributed to the conjunction 
of unequal target distributions and content-related response bi- 
ases. 


Sequence-Related Response Bias 


Hyman is also concerned about the randomization of the 
judging sequence 


because we can expect strong systematic biases during the judging 
procedure. The fact that the items to be judged have to be presented 
sequentially, when combined with what we know about subjective 
validation. . . would lead us to expect a strong tendency to select 
the first or second items during the judging series. 


Hyman’s hypothesis is correct: As shown in Table 1, receivers 
do display a position bias in their judgments x7(3, N = 354) = 
8.64, p <.05, tending to identify as targets clips appearing either 
first or last in the judging sequence. Moreover, the actual distri- 
bution of targets across the judging positions also departs sig- 
nificantly from a uniform distribution, x7(3, N = 354) = 7.83, 
p <.05, with targets appearing most frequently in the third po- 
sition. 

To determine whether the conjunction of these two unequal 
distributions might contribute artifactually to the hit rate, one 
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RESPONSE TO HYMAN 


ncombine the observed frequencies to derive an empir- 
{ baseline. As shown in Table 1, each proportion in the 
column can be multiplied by the corresponding propor- 
the third column to yield the hit rate expected if there 
) psi effect. As shown, the expected hit rate across all four 
2 positions is 24.7%. 
pertinent fact here is that this is lower than the 25% that 
have been obtained if the target positions had been uni- 
y distributed across the sessions. In other words, the con- 
on of receivers’ position biases with the imperfect ran- 
ration of target positions works against successful psi per- 
ince in these data. Again, inadequate randomization has 
ontributed artifactually to the hit rates. 


rnative Randomizing Strategies? 


yman suggests that ‘one way to prevent response biases 
1 distorting the hit rate is to use a randomizing procedure 

makes sure that each item within a target pool occurs 
ally often.” Coming from a critic as sophisticated as Hyman, 
is a very puzzling suggestion, because he appears to be sug- 

‘ing some variant of sampling without replacement, a proce- 
_ ¢ that would virtually guarantee response-bias artifacts. For 
“imple, if receivers tend to avoid selecting targets that ap- 

ared in previous sessions, this response bias would coincide 

th the actual diminishing probabilities that a previously seen 

‘get would reappear. The experimenters—who participate in 

any sessions and discuss them with one another—are in an 

‘en better position to detect and possibly to exploit the dimin- 

hing probabilities of target repetition. Sampling without re- 
lacement is precisely what enables card counters to improve 
reir odds at blackjack. 

Alternatively, perhaps Hyman is advocating a procedure in 
vhich the experiment continues until each clip within a judging 
et appears as a target a predesignated minimum number of 
‘imes. For purposes of analysis, the investigator then randomly 
discards excess sessions until the target frequencies are equal- 
ized at that minimum number. This would solve the response- 
bias problem but would be enormously wasteful. Suppose, for 
example, that only 4 sessions from each judging set would have 
to be discarded, on average, to equalize the target frequencies. 
With 40 judging sets, the investigator would end up discarding 
160 sessions, equal to nearly half of the sessions that took 
Honorton and his colleagues 612 years to collect! Only a study 
with many fewer judging sets could reasonably implement this 
strategy. 


Hit Rates as a Function of Target Repetition 


In his post hoc excursion through the autoganzfeld data, Hy- 
man uncovered an unexpected positive relationship between hit 
rates and the number of times targets had been targets in previ- 
ous sessions. (Ironically, Hyman has been one of the most out- 
spoken critics of parapsychologists who search through their 
data without specific hypotheses and then emerge with unex- 
pected “‘findings.”) 

If this finding is reliable and not just a fluke of post hoc ex- 
ploration, then it is difficult to interpret because target repeti- 
tion is confounded with the chronological sequence of sessions: 
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Table | 
Proportion of Sessions in Which Each Clip Was Selected as the 


Target and Proportion in Which It Appeared as the Target 


Position in Expected 

judging Selected Appeared hit rate 
sequence as target as target (%) 
1 30 .25 7.5 
2 .20 .24 49 
3 22 31 6.7 
4 .28 .20 5.6 
Total 1.00 1.00 24.7 


Note. N= 354 sessions. 


Higher repetitions of a target necessarily occur later in the se- 
quence than lower repetitions. In turn, the chronological se- 
quence of sessions is confounded with several other variables, 
including more experienced experimenters, more “talented” 
receivers (e.g., Juilliard students and receivers being retested be- 
cause of earlier successes), and methodological refinements in- 
troduced in the course of the program in an effort to enhance 
psi performance (e.g., experimenter “prompting”). 

Again, however, Hyman’s major concern is that this pattern 
might reflect an interaction between inadequate target random- 
ization and possible response biases on the part of those receiv- 
ers or experimenters who encounter the same judging set more 
than once. This seems highly unlikely. In the entire database, 
only 8 subjects saw the same judging set twice, and none of them 
performed better on the repetition than on the initial session. 
Similar arithmetic applies to experimenters: On average, each 
of the eight experimenters encountered a given judging set only 
1,03 times. The worst case is an experimenter who encountered 
the same judging set 6 times over the 6! years of the program. 
These six sessions yielded three hits, two of them in the first two 
sessions. 

At the end of his discussion, Hyman wonders whether this 
relationship between target repetition and hit rates is “due to an 
artifact or [does it] point to some new, hitherto unrecognized 
property of psi?” If it should turn out to be the latter, then I 
believe it only appropriate that parapsychologists reward his 
serendipity by calling it the Hyman Effect. : 
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